Prosper Loan Data by Alexander Rodriguez

Introduction: Credit is an ubiquitous aspect of our modern day lives. Credit allows people to borrow money for life expenses that one cannot afford without a loan. It is important to maintain a good credit rating to ensure that loan terms are favorable to the borrower. This project will provide visualizations about the Prosper Loans Dataset and the credit terms that borrowers receive based on their credit history. The Prosper Loans dataset contains information on custumer credit rating, loan status, loan amount and many other variables that would help to understand what factors are considered when credit is approved. Below are the two over-arching questions which the EDA process should reveal answers to.

Questions: What kinds of loan terms do people with excellent, good, fair, poor, and bad credit receive? What are the advantages to having a good credit score?

Univariate Plots Section

1. Check the credit rating data

##     A    AA     B     C     D     E    HR  NA's 
## 14551  5372 15581 18345 14274  9795  6935 29085


Comment: The prosper_rating_a column missing values were filled in with a corresponding credit score which will greatly help the analysis.

Observation: The mode for prosper_rating_a is a rating of ‘C’.
Also, many people with bad and poor credit recieve loans. ***

2. Investigate borrower APR

## loan$prosper_rating_a: AA
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.01650 0.08325 0.09136 0.09641 0.10140 0.33170      14 
## -------------------------------------------------------- 
## loan$prosper_rating_a: A
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.01315 0.12450 0.13710 0.13830 0.15040 0.36620       1 
## -------------------------------------------------------- 
## loan$prosper_rating_a: B
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.01325 0.16650 0.17750 0.17970 0.19500 0.37630       2 
## -------------------------------------------------------- 
## loan$prosper_rating_a: C
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.00653 0.20040 0.22110 0.21840 0.24200 0.40240       3 
## -------------------------------------------------------- 
## loan$prosper_rating_a: D
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.00653 0.24610 0.27470 0.26600 0.29510 0.41360       1 
## -------------------------------------------------------- 
## loan$prosper_rating_a: E
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.01657 0.30130 0.32440 0.31550 0.34620 0.41360       1 
## -------------------------------------------------------- 
## loan$prosper_rating_a: HR
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.00864 0.30550 0.35640 0.32760 0.35800 0.51230       2 
## -------------------------------------------------------- 
## loan$prosper_rating_a: NC
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.06698 0.18820 0.24130 0.23500 0.28770 0.33750       1


Observation: The distribution of borrower apr appears to be normally distributed.

Observation: To provide some insight on the first question of “What are the advantages of having good credit?”, from the summary the Max APR is the lowest for the highest credit rating of ‘AA’. It follows that as the credit rating decreases the Max apr increases.
Furthermore, the mean apr follows this trend as well.
The faceted histograms are also included as visualizations.


3. Investigate loan status

## [1] 0.169285
## [1] 0.1052502

Observation: The majority of the loans are either current or have been completed, 16.9% defaulted on their loan(This includes Charge-offs which is equal to 10.5%). It would be interesting to determine what credit rating has the highest proportion in the chargedoff and defaulted categories.


4. Investigate loan term


Observation: Most people get a loan term of 36 months.
Are the longer term loans for larger amount of money borrowed?


5. Investigate debt to income ratio.

## [1] 0.6740069
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8555

Observation: 67.4% of borrowers have debt to income ratio that is 30% or less. What factors are the most significant to be approved for a loan when a person has more than 30% debt to income ratio? Why are people being approved for loans with debt to income ratios greater than 100%?


6. Investigate Monthly Loan Payment

## loan$prosper_rating_a: AA
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   152.1   300.7   327.9   472.8  2164.0 
## -------------------------------------------------------- 
## loan$prosper_rating_a: A
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   164.6   314.7   337.0   475.6  2179.0 
## -------------------------------------------------------- 
## loan$prosper_rating_a: B
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   173.8   307.0   333.2   454.8  2219.0 
## -------------------------------------------------------- 
## loan$prosper_rating_a: C
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   147.6   267.4   302.2   396.2  2252.0 
## -------------------------------------------------------- 
## loan$prosper_rating_a: D
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   120.5   199.1   234.3   318.0  1385.0 
## -------------------------------------------------------- 
## loan$prosper_rating_a: E
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   115.3   162.0   172.7   199.5  1048.0 
## -------------------------------------------------------- 
## loan$prosper_rating_a: HR
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   86.85  135.70  130.60  173.70  774.70 
## -------------------------------------------------------- 
## loan$prosper_rating_a: NC
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   39.23   80.58   86.73  113.20  413.60


Observation: Borrowers with AA, A, and B have very similar distributions and the data shows that those borrowers are trusted with higher loan amounts and thus higher montly payments.


7. Investigate employment status and employment duration.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   26.00   67.00   96.07  137.00  755.00    7626


Observation: As expected most people are employed, however, some people are not employed. I wonder how they were able to get a loan. The distribution of employment_status_duaration is positively skewed and appears to be normally distributed with the square root transformation.


8. Univariate Analysis

Summary: This data set is replete with 81 features. One of the main features that I chose to investigate was credit rating which was titled’prosper_rating_a’. Credit rating is an interesting feature to study since it is a summary of a borrower’s credit worthiness. Through out the various plots that I created, there is an obvious advantage to having a credit rating that is ‘AA’, ‘A’, and ‘B’, since these credit ratings are connected to lower aprand higher loan amounts.
Other features of interest are debt-to-income ratio which ideally should be less than 30%, however, for some borrowers the ratio was above 100% and some as much as 1000%. I would like to understand why a person would get granted a loan with such a high debt-to-income ratio.
One feature in the dataset that will help support the investigation would be the actual credit score. Especially how the specific value of a credit score can affect credit terms.

Bivariate Plots Section

9. Investigate the relationship between Prosper Score and borrower apr

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1.00    4.00    6.00    5.95    8.00   11.00   29085

## <ScaleContinuousPosition>
##  Range:  
##  Limits:    1 --   10

Observation: There is big advantage to having a prosper score of 10 vs. 1. However, there seems to be very little advantage of having a 7,6 or a 5.


10A. Investigate borrower_apr vs. prosper score.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   660.0   680.0   685.6   720.0   880.0     592
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.00653 0.15630 0.20980 0.21880 0.28380 0.51230      26
## <ggproto object: Class ScaleDiscretePosition, ScaleDiscrete, Scale>
##     aesthetics: x xmin xmax xend
##     break_info: function
##     break_positions: function
##     breaks: 0 1 2 3 4 5 6 7 8 9 10
##     call: call
##     clone: function
##     dimension: function
##     drop: TRUE
##     expand: waiver
##     get_breaks: function
##     get_breaks_minor: function
##     get_labels: function
##     get_limits: function
##     guide: none
##     is_discrete: function
##     is_empty: function
##     labels: waiver
##     limits: 1 10
##     map: function
##     map_df: function
##     na.value: NA
##     name: waiver
##     palette: function
##     range: <ggproto object: Class RangeDiscrete, Range>
##         range: NULL
##         reset: function
##         train: function
##         super:  <ggproto object: Class RangeDiscrete, Range>
##     range_c: <ggproto object: Class RangeContinuous, Range>
##         range: NULL
##         reset: function
##         train: function
##         super:  <ggproto object: Class RangeContinuous, Range>
##     reset: function
##     scale_name: position_d
##     train: function
##     train_df: function
##     transform: function
##     transform_df: function
##     super:  <ggproto object: Class ScaleDiscretePosition, ScaleDiscrete, Scale>


Observation: There is a negative relationship between borrower_apr and CreditScoreRange. However, there is a lot of variability with people with high credit scores (above 700) receiving a loan with an apr that is above the average APR ofabout 20%.


10B. Look at the average borrower apr vs credit score
to look at the trend.


Observation: The average borrower apr is clearly has a negative relationship with prosper and credit score. The higher the prosper score the lower the average apr.


11. Does DebtToIncomeRatio help to predict ProsperScore?


Observation: The lowest debt-to-income buckets have the highest median Prosper Score. Meaning that as the debt-to-income ratio increases a borrowers scores is expected to decrease.


12. Does Credit Score help predict ProsperScore?


Observation: Having a good credit(above 750) score does not guarantee a good prosper score (8,9 or 10). In fact, there are some borrowers with a Credit Score above 800 that have a Prosper Score below 4.


13. Determine the outliers in the Debt to income ratio category

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.600   0.650   0.760   1.995   1.160  10.010

Observation: The minimum outlier is a Debt to income ratio of 60%.


14. Investigate credit score and debt to income ratio


Observation: Borrowers with Credit scores that are lower than 600 can receive a loan but the data shows that their debt to income ratios must be lower than borrowers with higher scores.


15. Investigate Debt to income ratio and loan status.


Comment: A DebtToIncome.bucket variable was created by using the five number summary. The minimum outlier is 0.06 and the maximum value in the data is 10.01.

Observation: Most people either pay off their loan or are in good standing.

16. Investigate employment_status_duration vs. borrower apr


Observation: Having a long emplyment status duration does not guarantee a low borrower apr.


17. Investigate listing category and Homeownership

## 
## FALSE  TRUE 
## 56459 57478


Observation: Roughly 50% of borrowers are homeowners. The majority of loans are for debt consolidation.


18. Bivariate Analysis

Summary: In this section I investigated the relationship between both prosper score and credit score with borrower apr. There is a clear advantage to having a high credit score and a high prosper score to getting an borrower apr that is below the 20% average. However, there are borrowers who received an apr that is above the average and have a high prosper score(8,9, 10).
Perhaps, there are other factors that have an effect on borrower apr such as occupation and employment status duration. Something of interest arose when I looked at employment status duration vs borrower apr. There is no observable relationship between having a long employment status duration and receiving a favorable borrower apr. The strongest relationship that I found in this section was the borrower debt-to-income-ratio vs. credit score range. The data indicates that borrowers with credit scores above 800 have lower debt-to-income ratios and borrowers who have scores less that 500 must have a debt-to-income-ratio that is less than the average.

Multivariate Plots Section

19. Investigate Monthly Loan Payment and Customer Principal Payments


Observation: I notice that borrowers with the lowest deb to income ratio have montly payment compared to their loan amount.


20. Investigate prosper score, borrower apr and debt-to-income-ratio

## [1] 303

Observation: This plot shows that to receive an borrower apr below 10% a borrower must be in the lowest DebtToIncome bucket(0,0.14]. However, some borrowers(total of 303) in the (0.32, 0.60] bucket also received an borrower apr of 10%.


21. Loan status and Monthly payment colored by prosper rating

## loan$loan_status: Cancelled
## AA  A  B  C  D  E HR NC 
##  0  1  0  1  0  0  3  0 
## -------------------------------------------------------- 
## loan$loan_status: Chargedoff
##   AA    A    B    C    D    E   HR   NC NA's 
##  402  852 1409 2016 2738 2077 2457   35    6 
## -------------------------------------------------------- 
## loan$loan_status: Completed
##   AA    A    B    C    D    E   HR   NC NA's 
## 4669 5708 5772 6586 7318 4184 3673   42  122 
## -------------------------------------------------------- 
## loan$loan_status: Current
##    AA     A     B     C     D     E    HR    NC 
##  3551 10755 11891 14001  7920  5558  2900     0 
## -------------------------------------------------------- 
## loan$loan_status: Defaulted
##   AA    A    B    C    D    E   HR   NC NA's 
##  221  362  581  863  966  858 1100   64    3 
## -------------------------------------------------------- 
## loan$loan_status: FinalPaymentInProgress
## AA  A  B  C  D  E HR NC 
## 13 34 39 38 28 32 21  0 
## -------------------------------------------------------- 
## loan$loan_status: Past Due (>120 days)
## AA  A  B  C  D  E HR NC 
##  0  0  2  4  4  4  2  0 
## -------------------------------------------------------- 
## loan$loan_status: Past Due (1-15 days)
##  AA   A   B   C   D   E  HR  NC 
##   7  62 109 203 190 136  99   0 
## -------------------------------------------------------- 
## loan$loan_status: Past Due (16-30 days)
## AA  A  B  C  D  E HR NC 
##  3 14 44 61 53 50 40  0 
## -------------------------------------------------------- 
## loan$loan_status: Past Due (31-60 days)
## AA  A  B  C  D  E HR NC 
##  7 29 36 93 84 67 47  0 
## -------------------------------------------------------- 
## loan$loan_status: Past Due (61-90 days)
## AA  A  B  C  D  E HR NC 
##  2 24 38 78 58 55 58  0 
## -------------------------------------------------------- 
## loan$loan_status: Past Due (91-120 days)
## AA  A  B  C  D  E HR NC 
##  6 25 49 50 68 63 43  0

Observation: There is alot of blue and red colored dots in the defaulted categories. Conversely, there is a lot of green and violet colored dots in the completed and current categories. Lastly, there are cases when borrowers with good credit default on their loans and even get a chargeoff.


22A.How does credit score and debt to income ratio affect borrower apr


Observation: Looks like a low debt to income ration and a high score will earn you a low borrower APR.

22B.Linear model for Credit score and borrower apr

## 
##  Pearson's product-moment correlation
## 
## data:  CreditScoreRangeLower and loan$borrower_apr
## t = -160.2137, df = 113344, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4344422 -0.4249487
## sample estimates:
##        cor 
## -0.4297073

Observation: The linear model shows that there is a negative relationship between the variables of borrower APR and Credit Score. The pearson correlation coefficient is -0.4297, which indicates that there is a weak negative correlation.


22C. Credit score and apr with percentile and average.


Observation: This plot shows that there is a a lot of variability with Borrower APR. There are more factors than just Credit Score that have an effect on Borrower APR such as Debt-To-Income ratio. However, having a high Credit Score does have an advantage as the lines show that on average the lowest Borrower APR is given to the people with the Highest Credit Scores.


23. Multivariate Analysis

Summary: The features of interest in this data set are Borrower APR and the variables/features that affect the APR positively or negatively. For, example I want to know how the Credit Score, Prosper Score, and Prosper Rating affect the APR. I noticed a lot of variability such as borrowers with a high Debt-to-income that received a low borrower APR. However, on average if a borrower has a good credit score, the borrower should expect to get a low Borrower APR.
One interaction that I found interesting is in section #21. I looked at this plot to consider if monthly payment has an effect on loan status. I thought that perhaps when a person has a high monthly payment they can over extend themselves causing them to default. However, people with a low credit rating such as ‘E’ are trusted with low monthly payments but that is not enough to prevent them from defaulting on the loan.
I understand that there are other factors such as Debt-to-Income Ratio and employment status that come into play as far as a person’s ability to repay a loan.

Models: The data set has a lot of variablity and thus there is a low negative
correlation between Borrower APR and Credit Score.


Final Plots and Summary

Tip: You’ve done a lot of exploration and have built up an understanding of the structure of and relationships between the variables in your dataset. Here, you will select three plots from all of your previous exploration to present here as a summary of some of your most interesting findings. Make sure that you have refined your selected plots for good titling, axis labels (with units), and good aesthetic choices (e.g. color, transparency). After each plot, make sure you justify why you chose each plot by describing what it shows.

24. Final Plot #1: Original Loan Amount by Prosper Rating

Description for Final Plot #1:This plot provides an answer to the question as to the benefits of having a good credit score (in this case a good Prosper Rating). Clearly, having a Prosper Rating of ‘AA’, ‘A’ or ‘B’ have a higher median Loan Amount and are even trusted with loan amounts above $ 30,000. A Prosper Rating of ‘C’ and below all have lower median Loan amounts.

25. Final Plot #2: Borrower Apr vs. Credit Score

Description for Final Plot #2: This plot shows that there is a lot of variability with Borrower APR. There are more factors than just Credit Score that have an effect on Borrower APR such as Debt-To-Income ratio. However,having a high Credit Score does have an advantage as the lines show that on average the lowest Borrower APR is given to the people with the Highest Credit Scores. Furthermore, the blue line(10th perentile) shows that about a 750 credit score those borrowers received an APR below 10%.

26. Final Plot #3: Monthly Loan Payment vs Loan Status,
colored by prosper rating

## loan$loan_status: Cancelled
## AA  A  B  C  D  E HR NC 
##  0  1  0  1  0  0  3  0 
## -------------------------------------------------------- 
## loan$loan_status: Chargedoff
##   AA    A    B    C    D    E   HR   NC NA's 
##  402  852 1409 2016 2738 2077 2457   35    6 
## -------------------------------------------------------- 
## loan$loan_status: Completed
##   AA    A    B    C    D    E   HR   NC NA's 
## 4669 5708 5772 6586 7318 4184 3673   42  122 
## -------------------------------------------------------- 
## loan$loan_status: Current
##    AA     A     B     C     D     E    HR    NC 
##  3551 10755 11891 14001  7920  5558  2900     0 
## -------------------------------------------------------- 
## loan$loan_status: Defaulted
##   AA    A    B    C    D    E   HR   NC NA's 
##  221  362  581  863  966  858 1100   64    3 
## -------------------------------------------------------- 
## loan$loan_status: FinalPaymentInProgress
## AA  A  B  C  D  E HR NC 
## 13 34 39 38 28 32 21  0 
## -------------------------------------------------------- 
## loan$loan_status: Past Due (>120 days)
## AA  A  B  C  D  E HR NC 
##  0  0  2  4  4  4  2  0 
## -------------------------------------------------------- 
## loan$loan_status: Past Due (1-15 days)
##  AA   A   B   C   D   E  HR  NC 
##   7  62 109 203 190 136  99   0 
## -------------------------------------------------------- 
## loan$loan_status: Past Due (16-30 days)
## AA  A  B  C  D  E HR NC 
##  3 14 44 61 53 50 40  0 
## -------------------------------------------------------- 
## loan$loan_status: Past Due (31-60 days)
## AA  A  B  C  D  E HR NC 
##  7 29 36 93 84 67 47  0 
## -------------------------------------------------------- 
## loan$loan_status: Past Due (61-90 days)
## AA  A  B  C  D  E HR NC 
##  2 24 38 78 58 55 58  0 
## -------------------------------------------------------- 
## loan$loan_status: Past Due (91-120 days)
## AA  A  B  C  D  E HR NC 
##  6 25 49 50 68 63 43  0

Description for Final Plot #3: 83% of the loans are either current or have been completed, 16.9% defaulted on their loan (This includes Charge-offs which is equal to 10.5%). This plot allows us to use a granulirized approach to visualize loan status. For example, only 3.3% of borrowers who were Chargedoff had a “AA” Prosper Rating. Whereas, 20.48% of borrowers who were Chargedoff had an ‘HR’ Prosper Rating. Lastly, this plot helps us visualize how montly payment is part of the loan status equation. As expected the median monthly payment for an ‘AA’ Prosper Score is higher the median monthly payment for an ‘E’ score. This helps to tell the story of how much risk Prosper Loans is willing to take on borrowers. Borrowers with a high Prosper Rating are trusted with higher montly payments and are more likely to complete their loans.


27. Reflection

Reflection: In the introduction over-arching questions were posed: Questions: What kinds of loan terms do people with excellent, good, fair, poor, and bad credit receive? What are the advantages of having good credit? To answer this question I created multiple plots to display and understand the data. The most important factors are: What is my credit score/rating? How much can I borrow(Loan Amount)?, How much will it cost to borrow the money(Borrower APR)? In the data I found that there is not one factor that can predict a Borrower APR. For example, there is substantial variability when comparing Borrower APR vs. Credit Score, however, on average as the Credit Score increases the Borrower APR decreases and the amount of the loan also increases. Meaning that the advantages of having a good credit are that it costs less to borrow money and there is trust with higher loan amounts. I found surprising that borrowers that receive a prosper rating of ‘AA’ and ‘A’ are considered a risk to the lender since there are some who default on their loan and even get Charged off. One of the main struggles was the complexity of the dataset since it contains 82 variables. A lot of exploration was necessary to understand which are the most relevant variables. In fact, I found that a deeper analysis needs to be done to find which variables have the most impact on Borrower APR. A multiple regression model would be the next steps in this exploration with variables such as Debt-To-Income-Ratio, Prosper Score, Prosper Rating, Credit Score, Employment Duration,Homeownership could be used to predict Borrower APR and Loan Amount. Lastly, there was no time-series plots included in the analysis. For further study I would like to look at how loan amounts varied over time.